object detection
- Europe > Poland (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
PrObeD: Proactive Object Detection Wrapper
These works are regarded as passive works for object detection as they take the input image as is. However, convergence to global minima is not guaranteed to be optimal in neural networks; therefore, we argue that the trained weights in the object detector are not optimal. To rectify this problem, we propose a wrapper based on proactive schemes, PrObeD, which enhances the performance of these object detectors by learning a signal. PrObeD consists of an encoder-decoder architecture, where the encoder network generates an image-dependent signal termed templates to encrypt the input images, and the decoder recovers this template from the encrypted images. We propose that learning the optimum template results in an object detector with an improved detection performance. The template acts as a mask to the input images to highlight semantics useful for the object detector. Finetuning the object detector with these encrypted images enhances the detection performance for both generic and camouflaged.
- South America > Brazil (0.04)
- North America > United States > Michigan (0.04)
- Asia (0.04)
- Information Technology (0.47)
- Health & Medicine (0.46)
Flow-based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection - Appendix Haibao Y u 1, 2, Yingjuan T ang
Mean A verage Precision (mAP). For VIC3D object detection, we focus on the obstacles around the ego vehicle. There are two metrics used for evaluation: BEV@mAP and 3D@mAP . BEV@mAP evaluates the 3D boxes in the bird's-eye view and ignores the In our implementation, we ignore the transmission cost of calibration files and timestamps. For early fusion, we calculate the transmission cost of transmitting raw data.
Described Object Detection: Liberating Object Detection with Flexible Expressions
Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called *Described Object Detection* (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a *Description Detection Dataset* ($D^3$). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on $D^3$, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code are available at https://github.com/shikras/d-cube
UMB: Understanding Model Behavior for Open-World Object Detection
Open-World Object Detection (OWOD) is a challenging task that requires the detector to identify unlabeled objects and continuously demands the detector to learn new knowledge based on existing ones. Existing methods primarily focus on recalling unknown objects, neglecting to explore the reasons behind them. This paper aims to understand the model's behavior in predicting the unknown category. First, we model the text attribute and the positive sample probability, obtaining their empirical probability, which can be seen as the detector's estimation of the likelihood of the target with certain known attributes being predicted as the foreground. Then, we jointly decide whether the current object should be categorized in the unknown category based on the empirical, the in-distribution, and the out-of-distribution probability. Finally, based on the decision-making process, we can infer the similarity of an unknown object to known classes and identify the attribute with the most significant impact on the decision-making process. This additional information can help us understand the behavior of the model's prediction in the unknown class. The evaluation results on the Real-World Object Detection (RWD) benchmark, which consists of five real-world application datasets, show that we surpassed the previous state-of-the-art (SOTA) with an absolute gain of 5.3 mAP for unknown classes, reaching 20.5 mAP. Our code is available at https://github.com/xxyzll/UMB.
ALDI-ray: Adapting the ALDI Framework for Security X-ray Object Detection
Heidari, Omid Reza, Wang, Yang, Zuo, Xinxin
Domain adaptation in object detection is critical for real-world applications where distribution shifts degrade model performance. Security X-ray imaging presents a unique challenge due to variations in scanning devices and environmental conditions, leading to significant domain discrepancies. To address this, we apply ALDI++, a domain adaptation framework that integrates self-distillation, feature alignment, and enhanced training strategies to mitigate domain shift effectively in this area. We conduct extensive experiments on the EDS dataset, demonstrating that ALDI++ surpasses the state-of-the-art (SOTA) domain adaptation methods across multiple adaptation scenarios. In particular, ALDI++ with a Vision Transformer for Detection (ViTDet) backbone achieves the highest mean average precision (mAP), confirming the effectiveness of transformer-based architectures for cross-domain object detection. Additionally, our category-wise analysis highlights consistent improvements in detection accuracy, reinforcing the robustness of the model across diverse object classes. Our findings establish ALDI++ as an efficient solution for domain-adaptive object detection, setting a new benchmark for performance stability and cross-domain generalization in security X-ray imagery.
- North America > Canada > Quebec > Montreal (0.40)
- North America > United States > Oklahoma > Beaver County (0.04)
PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving
Rezaei, Abdolazim, Sookhak, Mehdi
This study introduces PEFT -DML, a parameter-efficient deep metric learning framework for robust multi-modal 3D object detection in autonomous driving. Unlike conventional models that assume fixed sensor availability, PEFT -DML maps diverse modalities (LiDAR, radar, camera, IMU, GNSS) into a shared latent space, enabling reliable detection even under sensor dropout or unseen modality-class combinations. By integrating Low-Rank Adaptation (LoRA) and adapter layers, PEFT -DML achieves significant training efficiency while enhancing robustness to fast motion, weather variability, and domain shifts. Experiments on benchmarks nuScenes demonstrate superior accuracy.
- Transportation > Ground > Road (0.64)
- Information Technology > Robotics & Automation (0.64)
- Automobiles & Trucks (0.64)
Maritime Small Object Detection from UAVs using Deep Learning with Altitude-Aware Dynamic Tiling
Unmanned Aerial Vehicles (UAVs) are crucial in Search and Rescue (SAR) missions due to their ability to monitor vast maritime areas. However, small objects often remain difficult to detect from high altitudes due to low object-to-background pixel ratios. We propose an altitude-aware dynamic tiling method that scales and adaptively subdivides the image into tiles for enhanced small object detection. By integrating altitude-dependent scaling with an adaptive tiling factor, we reduce unnecessary computation while maintaining detection performance. Tested on the SeaDronesSee dataset [1] with YOLOv5 [2] and Slicing Aided Hyper Inference (SAHI) framework [3], our approach improves Mean Average Precision (mAP) for small objects by 38% compared to a baseline and achieves more than double the inference speed compared to static tiling. This approach enables more efficient and accurate UAV-based SAR operations under diverse conditions.
- Information Technology > Robotics & Automation (0.34)
- Aerospace & Defense > Aircraft (0.34)